Methods and Applications for High-Frequency Biosignals Data

Lily Koffman

Department of Biostatistics, Johns Hopkins School of Public Health

Introduction: accelerometry data

Introduction: accelerometry data

Introduction: big accelerometry data

Outline


  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer? (Digital fingerprinting)
  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer in big, free-living datasets?
  • Can we accurately find walking and count steps in free-living datasets?
  • Can we generalize conclusions from free-living accelerometry data to the US population?
  • Can we apply these methods in other, non-accelerometry datasets?

Outline


  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer? (Digital fingerprinting)
  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer in big, free-living datasets?
  • Can we accurately find walking and count steps in free-living datasets?
  • Can we generalize conclusions from free-living accelerometry data to the US population?
  • Can we apply these methods in other, non-accelerometry datasets?

Can we identify someone from their walking pattern measured by a wrist-worn accelerometer?

Problem setup

Problem setup

Problem setup

Big picture method: time series to scalar predictors

Details of the method

For each second and each person:

  • Obtain joint distribution of acceleration and lag acceleration for a series of lags

  • Calculate scalar summaries of the joint distribution

  • I will walk through the process for one second, one person, and one lag

  • Intuition: walking is cyclic process. We want to leverage cyclic nature of walking.

Obtain joint distribution of acceleration and lag acceleration

Obtain joint distribution of acceleration and lag acceleration

Obtain joint distribution of acceleration and lag acceleration

Obtain joint distribution of acceleration and lag acceleration

Obtain joint distribution of acceleration and lag acceleration

Obtain joint distribution of acceleration and lag acceleration

Derive predictors from joint distribution

Derive predictors from joint distribution

Derive predictors from joint distribution

Derive predictors from joint distribution

Derive predictors from joint distribution

Repeat for multiple lags

Repeat for multiple seconds

Fingerprints summarize predictors for a given lag and are different across individuals

Ready to fit models

Results

Koffman et al. (2023)

Why not use the entire joint distribution?

Why not use the entire joint distribution?

Functional regression approach

Why not use the entire joint distribution?

Functional regression approach

Why not use the entire joint distribution?

Functional regression approach

Why not use the entire joint distribution?

Functional regression approach

Functional regression approach

Toy example: 4 observations per second, 2 seconds, 1 individual

\(v_1(2)\): 2nd acceleration observation in second 1

data \[\begin{bmatrix} v_1(1) & v_1(2) & v_1(3) & v_1(4) \\ v_2(1) & v_2(2) & v_2(3) & v_2(4) \\ \end{bmatrix} \]

Functional regression approach

Toy example: 4 observations per second, 2 seconds, 1 individual

\(v_1(2)\): 2nd acceleration observation in second 1

data \[\begin{bmatrix} v_1(1) & v_1(2) & v_1(3) & v_1(4) \\ v_2(1) & v_2(2) & v_2(3) & v_2(4) \\ \end{bmatrix} \]

acceleration matrix \[\begin{bmatrix} v_1(2) & v_1(3) & v_1(4) & v_1(3) & v_1(4) & v_1(4) \\ v_2(2) & v_2(3) & v_2(4) & v_2(3) & v_2(4) & v_2(4) \\ \end{bmatrix} \]

Functional regression approach

Toy example: 4 observations per second, 2 seconds, 1 individual

\(v_1(2)\): 2nd acceleration observation in second 1

data \[\begin{bmatrix} v_1(1) & v_1(2) & v_1(3) & v_1(4) \\ v_2(1) & v_2(2) & v_2(3) & v_2(4) \\ \end{bmatrix} \]

acceleration matrix \[\begin{bmatrix} v_1(2) & v_1(3) & v_1(4) & v_1(3) & v_1(4) & v_1(4) \\ v_2(2) & v_2(3) & v_2(4) & v_2(3) & v_2(4) & v_2(4) \\ \end{bmatrix} \] lag acceleration matrix \[\begin{bmatrix} v_1(1) & v_1(1) & v_1(1) & v_1(2) & v_1(2) & v_1(3) \\ v_2(1) & v_2(1) & v_2(1) & v_2(2) & v_2(2) & v_2(3) \\ \end{bmatrix} \]

Functional regression approach

Toy example: 4 observations per second, 2 seconds, 1 individual

\(v_1(2)\): 2nd acceleration observation in second 1

data \[\begin{bmatrix} v_1(1) & v_1(2) & v_1(3) & v_1(4) \\ v_2(1) & v_2(2) & v_2(3) & v_2(4) \\ \end{bmatrix} \]

acceleration matrix \[\begin{bmatrix} v_1(2) & v_1(3) & v_1(4) & v_1(3) & v_1(4) & v_1(4) \\ v_2(2) & v_2(3) & v_2(4) & v_2(3) & v_2(4) & v_2(4) \\ \end{bmatrix} \] lag acceleration matrix \[\begin{bmatrix} v_1(1) & v_1(1) & v_1(1) & v_1(2) & v_1(2) & v_1(3) \\ v_2(1) & v_2(1) & v_2(1) & v_2(2) & v_2(2) & v_2(3) \\ \end{bmatrix} \]

lag matrix \[\begin{bmatrix} 1 & 2 & 3 & 1 & 2 & 1\\ 1 & 2 & 3 & 1 & 2 & 1\\\end{bmatrix} \]

Functional regression approach

Model outcomes as:

\[Y_{ij}^{i_0}\sim\text{Bernoulli}(p_{ij}^{i_0})\]

where \(Y_{ij}^{i_0} = 1\) if subject \(i\) in second \(j\) belongs to subject \(i_0\), and 0 otherwise

Functional regression approach

Model outcomes as:

\[Y_{ij}^{i_0}\sim\text{Bernoulli}(p_{ij}^{i_0})\]

where \(Y_{ij}^{i_0} = 1\) if subject \(i\) in second \(j\) belongs to subject \(i_0\), and 0 otherwise

Model:

\[\text{logit}(p_{ij}^{i_0}) =\beta_0^{i_0} + \int_{u=1}^S\int_{s=u}^SF_{i_0}\{ v_{ij}(s), v_{ij}(s-u), u\}dsdu \]

Functional regression approach

Model outcomes as:

\[Y_{ij}^{i_0}\sim\text{Bernoulli}(p_{ij}^{i_0})\]

where \(Y_{ij}^{i_0} = 1\) if subject \(i\) in second \(j\) belongs to subject \(i_0\), and 0 otherwise

Model:

\[\text{logit}(p_{ij}^{i_0}) =\beta_0^{i_0} + \int_{u=1}^S\int_{s=u}^SF_{i_0}\{ v_{ij}(s), v_{ij}(s-u), u\}dsdu \]

\(u = 1, \dots, S = 100\) (number of observations per second)

Functional regression approach

Model outcomes as:

\[Y_{ij}^{i_0}\sim\text{Bernoulli}(p_{ij}^{i_0})\]

where \(Y_{ij}^{i_0} = 1\) if subject \(i\) in second \(j\) belongs to subject \(i_0\), and 0 otherwise

Model:

\[\text{logit}(p_{ij}^{i_0}) =\beta_0^{i_0} + \int_{u=1}^S\int_{s=u}^SF_{i_0}\{ v_{ij}(s), v_{ij}(s-u), u\}dsdu \]

\(u = 1, \dots, S = 100\) (number of observations per second)

\(v_{ij}(s)\) = acceleration at centisecond \(s\) for subject \(i\) in second \(j\)

Functional regression approach

Model outcomes as:

\[Y_{ij}^{i_0}\sim\text{Bernoulli}(p_{ij}^{i_0})\]

where \(Y_{ij}^{i_0} = 1\) if subject \(i\) in second \(j\) belongs to subject \(i_0\), and 0 otherwise

Model:

\[\text{logit}(p_{ij}^{i_0}) =\beta_0^{i_0} + \int_{u=1}^S\int_{s=u}^SF_{i_0}\{ v_{ij}(s), v_{ij}(s-u), u\}dsdu \]

\(u = 1, \dots, S = 100\) (number of observations per second)

\(v_{ij}(s)\) = acceleration at centisecond \(s\) for subject \(i\) in second \(j\)

\(F(\cdot, \cdot, \cdot)\): trivariate smooth function, takes values at every point in the domain of acceleration, lag acceleration, lags

Functional regression approach

Model outcomes as:

\[Y_{ij}^{i_0}\sim\text{Bernoulli}(p_{ij}^{i_0})\]

where \(Y_{ij}^{i_0} = 1\) if subject \(i\) in second \(j\) belongs to subject \(i_0\), and 0 otherwise

Model:

\[\text{logit}(p_{ij}^{i_0}) =\beta_0^{i_0} + \int_{u=1}^S\int_{s=u}^SF_{i_0}\{ v_{ij}(s), v_{ij}(s-u), u\}dsdu \]

\(u = 1, \dots, S = 100\) (number of observations per second)

\(v_{ij}(s)\) = acceleration at centisecond \(s\) for subject \(i\) in second \(j\)

\(F(\cdot, \cdot, \cdot)\): trivariate smooth function, takes values at every point in the domain of acceleration, lag acceleration, lags

Fit using penalized splines with a quadratic penalty on the functional coefficient (Wood 2016)

Functional regression approach: implementation

model = mgcv::gam(
  Y_mat ~ te(
    accel_mat,
    lag_accel_mat,
    lag_mat,
    k = c(5, 5, 5),
    by = weight_mat),
  family = binomial(),
  method = "REML"
)
  • \(\texttt{te()}\): tensor product smooth

  • \(\texttt{k = c(5, 5, 5)}\) number of basis functions for each dimension of the tensor product smooth

  • \(\texttt{weight\_mat}\): matrix of weights of linear functionals of smooth terms. We use equal weights so the \(i,j^{\mathrm{th}}\) entry is \(\texttt{1/nrow(accel\_mat)}\)

  • \(\texttt{method="REML"}\): smoothing parameter selection with restricted maximum likelihood

Why not use fancier models?

Koffman, Crainiceanu, and Leroux (2024)

Why not use fancier models?

Rank-1 (rank-5) % accuracies

153 person dataset

3 min of walking seach

Two sessions at least 1 week apart

  • Train and test on session 1
    • Logistic regression: 92 (97)
    • XGBoost: 93 (99)
    • Functional regression: 98 (100)
  • Train on session 1, test on session 2
    • Logistic regression: 41 (75)
    • XGBoost: 58 (78)
    • Functional regression: 53 (69)
Koffman, Crainiceanu, and Leroux (2024)

Outline


  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer? (Digital fingerprinting)
    \(\rightarrow\) Yes!
  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer in big, free-living datasets when we don’t know when people are walking?
  • Can we accurately find walking and count steps in free-living datasets?
  • Can we generalize conclusions from free-living accelerometry data to the US population?
  • Can we apply these methods in other, non-accelerometry datasets?

Outline


  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer? (Digital fingerprinting)
    \(\rightarrow\) Yes!
  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer in big, free-living datasets?
  • Can we accurately find walking and count steps in free-living datasets?
  • Can we generalize conclusions from free-living accelerometry data to the US population?
  • Can we apply these methods in other, non-accelerometry datasets?

Outline


  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer? (Digital fingerprinting)
    \(\rightarrow\) Yes!
  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer in big, free-living datasets?
  • Can we accurately find walking and count steps in free-living datasets?
  • Can we generalize conclusions from free-living accelerometry data to the US population?
  • Can we apply these methods in other, non-accelerometry datasets?

Can we accurately find walking and count steps in free-living datasets?

Validation: identifying walking in free-living datasets

5 open-source algorithms, 3 datasets with gold-standard step counts

Koffman and Muschelli (2024)

Application: idenfiying walking and counting steps in NHANES

  • \(>15,000\) participants
  • \(7\) days of wrist accelerometry
  • \(10\)Tb of data
  • Over 1 year computation time
  • Open source pipeline
  • Open source data repository
  • First nationally representative estimate of steps in the US population
Koffman and Muschelli (2025a)

Application: idenfiying walking and counting steps in NHANES

How many steps does the average American take per day?

Application: idenfiying walking and counting steps in NHANES

Do estimates differ by algorithm?

Application: idenfiying walking and counting steps in NHANES

Are more steps associated with lower mortality risk?

Koffman, Crainiceanu, and Muschelli (2024)

Outline


  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer? (Digital fingerprinting)
    \(\rightarrow\) Yes!
  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer in big, free-living datasets?
  • Can we accurately find walking and count steps in free-living datasets?
    \(\rightarrow\) Yes!
  • Can we generalize conclusions from free-living accelerometry data to the US population?
  • Can we apply these methods in other, non-accelerometry datasets?

Outline


  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer? (Digital fingerprinting)
    \(\rightarrow\) Yes!
  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer in big, free-living datasets?
  • Can we accurately find walking and count steps in free-living datasets?
    \(\rightarrow\) Yes!
  • Can we generalize conclusions from free-living accelerometry data to the US population?
  • Can we apply these methods in other, non-accelerometry datasets?

Can we generalize conclusions from free-living accelerometry data to the US population?

Sex differences in steps?

Do males take more steps than females? At what points during the day?

Function on scalar regression

Implementation: Fast univariate inference (FUI) Cui et al. (2021) \[\mathbb{E}[\mathrm{steps}_i(s)] = \beta_0(s) + \beta_1(s)\mathrm{gender}_i + \beta_2(s)\mathrm{age}_i \]

\(i\): participant

\(s \in \{1, \dots, 1440\}\): each minute of the day

Fit separate GLM at each point \(s\) and smooth the resulting point estimates to get estimated effect of age, sex on steps profile

Bootstrap subjects to get confidence bands

Function on scalar regression

Implementation: Fast univariate inference (FUI) Cui et al. (2021) \[\mathbb{E}[\mathrm{steps}_i(s)] = \beta_0(s) + \beta_1(s)\mathrm{gender}_i + \beta_2(s)\mathrm{age}_i \]

\(i\): participant

\(s \in \{1, \dots, 1440\}\): each minute of the day

Fit separate GLM at each point \(s\) and smooth the resulting point estimates to get estimated effect of age, sex on steps profile

Bootstrap subjects to get confidence bands


BUT: NHANES is not a simple random sample

  • Individuals are sampled in geographic clusters

  • Minority groups are oversampled

Are our estimates valid for population-level inference?

Function on scalar regression

Implementation: Fast univariate inference (FUI) Cui et al. (2021) \[\mathbb{E}[\mathrm{steps}_i(s)] = \beta_0(s) + \beta_1(s)\mathrm{gender}_i + \beta_2(s)\mathrm{age}_i \]

\(i\): participant

\(s \in \{1, \dots, 1440\}\): each minute of the day

Fit separate GLM at each point \(s\) and smooth the resulting point estimates to get estimated effect of age, sex on steps profile

Bootstrap subjects to get confidence bands


BUT: NHANES is not a simple random sample

  • Individuals are sampled in geographic clusters

  • Minority groups are oversampled

Are our estimates valid for population-level inference?

For standard regression: \(\texttt{svyglm}\), \(\texttt{svycoxph}\)

Function on scalar regression

Implementation: Fast univariate inference (FUI) Cui et al. (2021) \[\mathbb{E}[\mathrm{steps}_i(s)] = \beta_0(s) + \beta_1(s)\mathrm{gender}_i + \beta_2(s)\mathrm{age}_i \]

\(i\): participant

\(s \in \{1, \dots, 1440\}\): each minute of the day

Fit separate GLM at each point \(s\) and smooth the resulting point estimates to get estimated effect of age, sex on steps profile

Bootstrap subjects to get confidence bands


BUT: NHANES is not a simple random sample

  • Individuals are sampled in geographic clusters

  • Minority groups are oversampled

Are our estimates valid for population-level inference?

For standard regression: \(\texttt{svyglm}\), \(\texttt{svycoxph}\)

For functional regression: ?

Complex survey function on scalar regression: simulation

Koffman et al. (2025)

Complex survey function on scalar regression: software

model_fit = svyfosr::svyfui(steps_mat ~ age + gender,
                            weights = survey_weight,
                            data = steps_df,
                            family = gaussian(),
                            boot_type = "BRR",
                            num_boots = 500,
                            parallel = TRUE,
                            seed = 2213)
Koffman and Muschelli (2025b)

Complex survey function on scalar regression: application

\[\mathbb{E}[\mathrm{steps}_i(s)] = \beta_0(s) + \beta_1(s)\mathrm{gender}_i + \beta_2(s)\mathrm{age}_i \]

\(i\): participant

\(s \in \{1, \dots, 1440\}\): each minute of the day

\(\beta_0(s)\): mean steps over the course of the day taken for males age 0

\(\beta_1(s)\): how many additional steps do females take compared to males, over the course of the day, controlling for age?

Complex survey function on scalar regression: application

Complex survey function on scalar regression: application

Outline


  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer? (Digital fingerprinting)
    \(\rightarrow\) Yes!
  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer in big, free-living datasets?
  • Can we accurately find walking and count steps in free-living datasets?
    \(\rightarrow\) Yes!
  • Can we generalize conclusions from free-living accelerometry data to the US population?
    \(\rightarrow\) Yes!
  • Can we apply these methods in other, non-accelerometry datasets?

Outline


  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer? (Digital fingerprinting)
    \(\rightarrow\) Yes!
  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer in big, free-living datasets?
  • Can we accurately find walking and count steps in free-living datasets?
    \(\rightarrow\) Yes!
  • Can we generalize conclusions from free-living accelerometry data to the US population?
    \(\rightarrow\) Yes!
  • Can we apply these methods in other, non-accelerometry datasets?

Can we identify someone from their walking pattern measured by a wrist-worn accelerometer in big, free-living datasets?

Digital fingerprinting in NHANES

  • Use highly specific method to identify walking in NHANES (minimize false positives) (Karas et al. 2019)
  • \(N = 13{,}000\) individuals with 3 minutes walking per person
  • 3:1 train/test split
  • Logistic regression + weighting to overcome class imbalance
  • 43% rank-1 accuracy
  • 73% rank-5 accuracy
  • 97% rank-1% accuracy (correct subject is in the top 130 predictions)
  • 100% rank-5% accuracy (correct subject is in the top 650 predictions)

Digital fingerprinting in NHANES

Koffman, Muschelli, and Crainiceanu (2025)

Outline


  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer? (Digital fingerprinting)
    \(\rightarrow\) Yes!
  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer in big, free-living datasets?
    \(\rightarrow\) Yes!
  • Can we accurately find walking and count steps in free-living datasets?
    \(\rightarrow\) Yes!
  • Can we generalize conclusions from free-living accelerometry data to the US population?
    \(\rightarrow\) Yes!
  • Can we apply these methods in other, non-accelerometry datasets?

Outline


  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer? (Digital fingerprinting)
    \(\rightarrow\) Yes!
  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer in big, free-living datasets?
    \(\rightarrow\) Yes!
  • Can we accurately find walking and count steps in free-living datasets?
    \(\rightarrow\) Yes!
  • Can we generalize conclusions from free-living accelerometry data to the US population?
    \(\rightarrow\) Yes!
  • Can we apply these methods in other, non-accelerometry datasets?

Can we apply these methods in other, non-accelerometry datasets?

Arterial waveform

Arterial waveform

Arterial waveform

Fingerprinting with arterial waveform

Fingerprinting with arterial waveform

  • Obtain predictors for many different lags and cut points
  • Use predictors that are top 10 contributors to first 30 PCs (\(\approx 100\) predictors)
  • Fit XGBoost model on 727 patients
  • Mean (SD) \(7 (1.8)\) minutes per patient, range \(3\)-\(16\) minutes

Fingerprinting with arterial waveform

Outline


  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer? (Digital fingerprinting)
    \(\rightarrow\) Yes!
  • Can we identify someone from their walking pattern measured by a wrist-worn accelerometer in big, free-living datasets?
    \(\rightarrow\) Yes!
  • Can we accurately find walking and count steps in free-living datasets?
    \(\rightarrow\) Yes!
  • Can we generalize conclusions from free-living accelerometry data to the US population?
    \(\rightarrow\) Yes!
  • Can we apply these methods in other, non-accelerometry datasets?
    \(\rightarrow\) Yes!

Future Directions

  • Using changes in fingerprint (both walking and waveform) to predict changes in function
  • Designing real-time interventions based on hemodynamics patterns
  • Extending survey FoSR to longitudinal outcomes
  • Standardizing processing and analysis pipelines for wearable accelerometry

Thank you!


References

Cui, Erjia, Andrew Leroux, Ekaterina Smirnova, and Ciprian M. Crainiceanu. 2021. “Fast Univariate Inference for Longitudinal Functional Models.” Journal of Computational and Graphical Statistics 31 (1): 219–30. https://doi.org/10.1080/10618600.2021.1950006.
Karas, Marta, Marcin Stra̧czkiewicz, William Fadel, Jaroslaw Harezlak, Ciprian M Crainiceanu, and Jacek K Urbanek. 2019. “Adaptive Empirical Pattern Transformation (ADEPT) with Application to Walking Stride Segmentation.” Biostatistics 22 (2): 331–47. https://doi.org/10.1093/biostatistics/kxz033.
Koffman, Lily, Ciprian Crainiceanu, and Andrew Leroux. 2024. “Walking Fingerprinting.” Journal of the Royal Statistical Society Series C: Applied Statistics 73 (5): 1221–41. https://doi.org/10.1093/jrsssc/qlae033.
Koffman, Lily, Ciprian Crainiceanu, and John Muschelli. 2024. “Comparing Step Counting Algorithms for High-Resolution Wrist Accelerometry Data in NHANES 2011–2014.” Medicine & Science in Sports & Exercise 57 (4): 746–55. https://doi.org/10.1249/mss.0000000000003616.
Koffman, Lily, Sunan Gao, Xinkai Zhou, Andrew Leroux, Ciprian Crainiceanu, and John Muschelli III. 2025. “Function on Scalar Regression with Complex Survey Designs.” https://arxiv.org/abs/2511.05487.
Koffman, Lily, and John Muschelli. 2024. “Evaluating Step Counting Algorithms on Subsecond Wrist-Worn Accelerometry: A Comparison Using Publicly Available Data Sets.” Journal for the Measurement of Physical Behaviour 7 (1). https://doi.org/10.1123/jmpb.2024-0008.
———. 2025a. “Minute Level Step Counts and Physical Activity Data from the National Health and Nutrition Examination Survey (NHANES) 2011-2014.” PhysioNet. https://doi.org/10.13026/9N0R-TV02.
———. 2025b. Svyfosr: Survey-Weighted Function on Scalar Regression. https://github.com/jhuwit/svyfosr.
Koffman, Lily, John Muschelli, and Ciprian Crainiceanu. 2025. “Walking Fingerprinting Using Wrist Accelerometry During Activities of Daily Living in NHANES.” https://arxiv.org/abs/2506.17160.
Koffman, Lily, Yan Zhang, Jaroslaw Harezlak, Ciprian Crainiceanu, and Andrew Leroux. 2023. “Fingerprinting Walking Using Wrist-Worn Accelerometers.” Gait & Posture 103 (June): 92–98. https://doi.org/10.1016/j.gaitpost.2023.05.001.
Wood, Simon N. 2016. “P-Splines with Derivative Based Penalties and Tensor Product Smoothing of Unevenly Distributed Data.” Statistics and Computing 27 (4): 985–89. https://doi.org/10.1007/s11222-016-9666-x.